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Abstract 

We describe the CoNLL-2000 shared task: 
dividing text into syntactically related non- 
overlapping groups of words, so-called text 
chunking. We give background information on 
the data sets, present a general overview of the 
systems that have taken part in the shared task 
and briefly discuss their performance. 

1 Introduction 

Text chunking is a useful preprocessing step 
for parsing. There has been a large inter- 
est in recognizing non-overlapping noun phrases 
(Ramshaw and Marcus ( 1995| ) and follow-up pa- 
pers) but relatively little has been written about 
identifying phrases of other syntactic categories. 
The CoNLL-2000 shared task attempts to fill 
this gap. 

2 Task description 

Text chunking consists of dividing a text into 
phrases in such a way that syntactically re- 
lated words become member of the same phrase. 
These phrases are non-overlapping which means 
that one word can only be a member of one 
chunk. Here is an example sentence: 

[np He ] [vp reckons ] [np the current 
account deficit ] [vp will narrow ] 
[pp to ] [np only £ 1.8 billion ] 
[pp in ] [np September ] . 

Chunks have been represented as groups of 
words between square brackets. A tag next to 
the open bracket denotes the type of the chunk. 
As far as we know, there are no annotated cor- 
pora available which contain specific informa- 
tion about dividing sentences into chunks of 
words of arbitrary types. We have chosen to 
work with a corpus with parse information, the 



Wall Street Journa l (WSJ) part of th e Penn 
Treebank II corpus ( [Marcus et al., 1993 ), and to 
extract chunk information from the parse trees 
in this corpus. We will give a global description 
of the various chunk types in the next section. 

3 Chunk Types 

The chunk types are based on the syntactic cat- 
egory part (i.e. without function tag) of th e 
bracket label in the Treebank (cf. Bies ( |1995| ) 
p. 35). Roughly, a chunk contains everything to 
the left of and including the syntactic head of 
the constituent of the same name. Some Tree- 
bank constituents do not have related chunks. 
The head of S (simple declarative clause) for ex- 
ample is normally thought to be the verb, but 
as the verb is already part of the VP chunk, no 
S chunk exists in our example sentence. 

Besides the head, a chunk also contains pre- 
modifiers (like determiners and adjectives in 
NPs), but no postmodifiers or arguments. This 
is why the PP chunk only contains the preposi- 
tion, and not the argument NP, and the SBAR 
chunk consists of only the complementizer. 

There are several difficulties when converting 
trees into chunks. In the most simple case, a 
chunk is just a syntactic constituent without 
any further embedded constituents, like the NPs 
in our examples. In some cases, the chunk con- 
tains only what is left after other chunks have 
been removed from the constituent, cf. "(VP 
loves (NP Mary))" above, or ADJPs and PPs 
below. We will discuss some special cases dur- 
ing the following description of the individual 
chunk types. 

3.1 NP 

Our NP chunks are very similar to the ones of 
Ramshaw and Marcus ( |1995|) . Specifically, pos- 
sessive NP constructions are split in front of 
the possessive marker (e.g. [np Eastern Air- 
lines ] [np ' creditors ] ) and the handling of co- 
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ordinated NPs follows the Treebank annotators. 
However, as Ramshaw and Marcus do not de- 
scribe the details of their conversion algorithm, 
results may differ in difficult cases, e.g. involv- 
ing NAC and NXQ 

An ADJP constituent inside an NP con- 
stituent becomes part of the NP chunk: 

(NP The (ADJP most volatile) form) 
— > [np the most volatile form ] 

3.2 VP 

In the Treebank, verb phrases are highly embed- 
ded; see e.g. the following sentence which con- 
tains four VP constituents. Following Ramshaw 
and Marcus' V-type chunks, this sentence will 
only contain one VP chunk: 

((S (NP-SBJ-3 Mr. Icahn) (VP may 
not (VP want (S (NP-SBJ *-3) (VP to 
(VP sell ...))))) . )) 
— » [np Mr. Icahn ] [yp may not want 
to sell ] ... 

It is still possible however to have one VP chunk 
directly follow another: [np The impression ] 
[np I ] [vp have got ] [ V p is ] [ N p they ] [ V p 'd 
love to do ] [prt away ] [pp with ] [np it ] . In this 
case the two VP constituents did not overlap in 
the Treebank. 

Adverbs/adverbial phrases become part of 
the VP chunk (as long as they are in front of 
the main verb): 

(VP could (ADVP very well) (VP 

show ... )) 

— > [yp could very well show ] ... 



In contrast to Ramshaw and Marcus ( |199- r i| ) . 
predicative adjectives of the verb are not part 
of the VP chunk, e.g. in "[np they ] [yp are ] 
[adjp unhappy]". 

In inverted sentences, the auxiliary verb is not 
part of any verb phrase in the Treebank. Con- 
sequently it does not belong to any VP chunk: 

((S (SINV (CONJP Not only) does 
(NP-SBJ- 1 your product) (VP have (S 



^.g. (NP-SBJ (NP Robin Leigh-Pemberton) , (NP 
(NAC Bank (PP of (NP England))) governor) ,) which 
we convert to [np Robin Leigh-Pemberton ] , Bank 
[pp of ] [np England ] [np governor ] whereas Ramshaw 
and Marcus state that ' "governor" is not included in 
any baseNP chunk'. 



(NP-SBJ *-l) (VP to (VP be (ADJP- 
PRD excellent)))))) , but ... 
-> [conjp Not only ] does [ N p your 
product ] [yp have to be ] [adjp ex- 
cellent ] , but ... 

3.3 ADVP and ADJP 

ADVP chunks mostly correspond to ADVP con- 
stituents in the Treebank. However, ADVPs in- 
side AD JPs or inside VPs if in front of the main 
verb are assimilated into the ADJP respectively 
VP chunk. On the other hand, ADVPs that 
contain an NP make two chunks: 

(ADVP-TMP (NP a year) earlier) 
-> [np a year ] [advp earlier ] 

ADJPs inside NPs are assimilated into the NP. 
And parallel to ADVPs, ADJPs that contain an 
NP make two chunks: 

(ADJP-PRD (NP 68 years) old) 
-> [ N p 68 years ] [adjp old ] 

It would be interesting to see how chang- 
ing these decisions (as can be done in the 
Treebank-to-chunk conversion script^]) influ- 
ences the chunking task. 

3.4 PP and SBAR 

Most PP chunks just consist of one word (the 
preposition) with the part-of-speech tag IN. 
This does not mean, though, that finding PP 
chunks is completely trivial. INs can also con- 
stitute an SBAR chunk (see below) and some 
PP chunks contain more than one word. This 
is the case with fixed multi-word prepositions 
such as such as, because of, due to, with prepo- 
sitions preceded by a modifier: well above, just 
after, even in, particularly among or with coor- 
dinated prepositions: inside and outside. We 
think that PPs behave sufficiently differently 
from NPs in a sentence for not wanting to group 
them into one class (as Ramshaw and Marcus 
did in their N-type chunks), and that on the 
other hand tagging all NP chunks inside a PP 
as I-PP would only confuse the chunker. We 
therefore chose not to handle the recognition of 
true PPs (prep.+NP) during this first chunking 
step. 



The Treebank-to-chunk conversion scrip t is available 
from http://ilk.kub.nl/~sabine/chunklink/ 
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SBAR chunks mostly consist of one word (the 
complementizer) with the part-of-speech tag IN, 
but like multi-word prepositions, there are also 
multi-word complementizers: even though, so 
that, just as, even if, as if, only if. 

3.5 CONJP, PRT, INTJ, LST, UCP 

Conjunctions can consist of more than one word 
as well: as well as, instead of, rather than, not 
only, but also. One-word conjunctions (like and, 
or) are not annotated as CONJP in the Tree- 
bank, and are consequently no CONJP chunks 
in our data. 

The Treebank uses the PRT constituent to 
annotate verb particles, and our PRT chunk 
does the same. The only multi-word particle 
is on and off. This chunk type should be easy 
to recognize as it should coincide with the part- 
of-speech tag RP, but through tagging errors it 
is sometimes also assigned IN (preposition) or 
RB (adverb). 

INTJ is an interjection phrase/chunk like no, 
oh, hello, alas, good grief!. It is quite rare. 

The list marker LST is even rarer. Examples 
are 1., 2., 3., first, second, a, b, c. It might con- 
sist of two words: the number and the period. 

The UCP chunk is reminiscent of the UCP 
(unlike coordinated phrase) constituent in the 
Treebank. Arguably, the conjunction is the 
head of the UCP, so most UCP chunks consist 
of conjunctions like and and or. UCPs are the 
rarest chunks and are probably not very useful 
for other NLP tasks. 

3.6 Tokens outside 

Tokens outside any chunk are mostly punctua- 
tion signs and the conjunctions in ordinary coor- 
dinated phrases. The word not may also be out- 
side of any chunk. This happens in two cases: 
Either not is not inside the VP constituent in 
the Treebank annotation e.g. in 

... (VP have (VP told (NP-1 clients) 
(S (NP-SBJ *-l) not (VP to (VP ship 
(NP anything)))))) 

or not is not followed by another verb (because 
the main verb is a form of to be). As the right 
chunk boundary is defined by the chunk's head, 
i.e. the main verb in this case, not is then in fact 
a postmodifier and as such not included in the 
chunk: "... [sbar that ] [np there ] [yp were ] 
n't [np any major problems ] ." 



3.7 Problems 

All chunks were automatically extracted from 
the parsed version of the Treebank, guided by 
the tree structure, the syntactic constituent la- 
bels, the part-of-speech tags and by knowledge 
about which tags can be heads of which con- 
stituents. However, some trees are very complex 
and some annotations are inconsistent. What 
to think about a VP in which the main verb is 
tagged as NN (common noun)? Either we al- 
low NNs as heads of VPs (not very elegant but 
which is what we did) or we have a VP without 
a head. The first solution might also introduce 
errors elsewhere... As Ramshaw and Marcus 



( 1995 ) already noted: "While this automatic 
derivation process introduced a small percent- 
age of errors on its own, it was the only practi- 
cal way both to provide the amount of training 
data required and to allow for fully-automatic 
testing." 



4 Data and Evaluation 

For the CoNLL shared task, we have chosen 
to work with the same sections of the Penn 
Treebank as the widely used data set for base 
noun phrase recognition (Ramshaw and Mar- 
|cus, 1995|) : WSJ sections 15-18 of the Penn 



Treebank as training material and section 20 
as test material[j. The chunks in the data 
were selected to match the descriptions in the 
previous section. An overview of the chunk 
types in the training data can be found in ta- 
ble [y. De data sets contain tokens (words and 
punctuation marks), information about the lo- 
cation of sentence boundaries and information 
about chunk boundaries. Additionally, a part- 
of-speech (POS) tag was assigned to ea ch token 
by a standard POS tagger (Brill ( |1994|) trained 
on the Penn Treebank). We used these POS 
tags rather than the Treebank ones in order to 
make sure that the performance rates obtained 
for this data are realistic estimates for data for 
which no treebank POS tags are available. 

In our example sentence in section |2[ we have 
used brackets for encoding text chunks. In the 
data sets we have represented chunks with three 
types of tags: 



3 The text chunking data set is ava ilable at |ittp://lcg- 
www.uia.ac.be/conll2000/chunking/ 
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count 


% 


type 


55081 


51% 


NP (noun phrase) 


21467 


20% 


VP (verb phrase) 


21281 


20% 


PP (prepositional phrase) 


4227 


4% 


ADVP (adverb phrase) 


2207 


2% 


SBAR (subordinated clause) 


2060 


2% 


ADJP (adjective phrase) 


556 


1% 


PRT (particles) 


56 


0% 


CONJP (conjunction phrase) 


31 


0% 


INTJ (interjection) 


10 


0% 


LST (list marker) 


2 


0% 


UCP (unlike coordinated phrase) 



Table 1: Number of chunks per phrase type 
in the training data (211727 tokens, 106978 
chunks) . 

B-X 

I-X 

o 



first word of a chunk of type X 
non-initial word in an X chunk 
word outside of any chunk 



This representation type is based on a repre- 
sentation proposed by Ramshaw and Marcus 
(1995) for noun phrase chunks. The three tag 
groups are sufficient for encoding the chunks in 
the data since these are non-overlapping. Using 
these chunk tags makes it possible to approach 
the chunking task as a word classification task. 
We can use chunk tags for representing our ex- 
ample sentence in the following way: 

He/B-NP reckons/B-VP the/B-NP 
current/I-NP account/I-NP 
deficit/I-NP will/B-VP narrow/I- VP 
to/B-PP only/B-NP .C/I-NP 
1.8/I-NP billion/B-NP in/B-PP 
September/B-NP ./O 

The output of a chunk recognizer may contain 
inconsistencies in the chunk tags in case a word 
tagged I-X follows a word tagged O or I-Y, with 
X and Y being different. These inconsistencies 
can be resolved by assuming that such I-X tags 
start a new chunk. 

The performance on this task is measured 
with three rates. First, the percentage of 
detected phrases that are correct (precision). 
Second, the percentage of phrases in the 
data that were found by the chunker (recall). 
And third, the Ffl = i rate which is equal to 
(/3 2 +l)*precision*recall / (/3 2 *precision+recall) 
with (3=1 ( van Rijsbergen, 1975 ). The latter 



rate has been used as the target for optimiza- 
tionQ 

5 Results 

The eleven systems that have been applied to 
the CoNLL-2000 shared task can be divided in 
four groups: 

1. Rule-based systems: Villain and Day; Jo- 
hansson; Dejean. 

2. Memory-based systems: Veenstra and Van 
den Bosch. 

3. Statistical systems: Pla, Molina and Pri- 
eto; Osborne; Koeling; Zhou, Tey and Su. 

4. Combined systems: Tjong Kim Sang; Van 
Halteren; Kudoh and Matsumoto. 



Vilain and Day (|2000[) approached the shared 
task in three different ways. The most success- 
ful was an application of the Alembic parser 
which uses transformation-based rules. Johans- 
son ( |200C| ) uses context-sensitive and context- 
free rules for transforming part-of-speech (POS) 
tag se quences to chunk tag sequences. Dejean 
( 2000 ) has applied the theory refinement sys- 
tem ALLiS to the shared task. In order to ob- 
tain a system which could process XML format- 
ted data while using context information, he 
has used three extra tools. Veenstra and Van 
den Bosch ( [200C ) examined different parame- 
ter settings of a memory-based learning algo- 
rithm. They found that modified value differ- 
ence metric applied to POS information only 
worked best. 

A large number of the systems applied to 
the CoNLL-2000 shared task uses statistical 
methods. Pla, Molina and Prieto ( 20001) use 
a finite-state version of Markov Models. They 
started with using POS information only and 
obtained a better performance when lexical 
information was used. Zhou, Tey and Su 



( 2000 ) implemented a chunk tagger based on 
HMMs. The initial performance of the tag- 
ger was improved by a post-process correction 
method based on error driven learning and by 
incorporating chunk probabilities generated by 



4 In the literature about related tasks sometimes the 
tagging accuracy is mentioned as well. However, since 
the relation between tag accuracy and chunk precision 
and recall is not very strict, tagging accuracy is not a 
good evaluation measure for this task. 
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test data 


precision 


recall 


F/3=l 


Kudoh and Matsumoto 


93.45% 


93.51% 


93.48 


Van Halteren 


93.13% 


93.51% 


93.32 


Tjong Kim Sang 


94.04% 


91.00% 


92.50 


Zhou, Tey and Su 


91.99% 


92.25% 


92.12 


Dejean 


91.87% 


91.31% 


92.09 


Koeling 


92.08% 


91.86% 


91.97 


Osborne 


91.65% 


92.23% 


91.94 


Veenstra and Van den Bosch 


91.05% 


92.03% 


91.54 


Pla, Molina and Prieto 


90.63% 


89.65% 


90.14 


Johansson 


86.24% 


88.25% 


87.23 


Vilain and Day 


88.82% 


82.91% 


85.76 


baseline 


72.58% 


82.14% 


77.07 



Table 2: Performance of the eleven systems on the test data. The baseline results have been 
obtained by selecting the most frequent chunk tag for each part-of-speech tag. 



a memory-based learning process. The two 
other statistical systems use maximum-entropy 
based methods. Osborne ( 20001) trained Ratna- 
parkhi's maximum-entropy POS tagger to out- 
put chunk tags. Koeling ( [2000 ) used a stan- 
dard maximum-entropy learner for generating 
chunk tags from words and POS tags. Both 
have tested different feature combinations be- 
fore finding an optimal one and their final re- 
sults are close to each other. 

Three systems use system combination. 
Tjong Kim Sang ( 2000| ) trained and tested five 
memory-based learning systems to produce dif- 
ferent representations of the chunk tags. A 
combination of the five by majority voting per- 
formed better than the individual parts. Van 
Halteren ( 2000 ) used Weighted Probability Dis- 
tribution Voting (WPDV) for combining the 
results of four WPDV chunk taggers and a 
memory-based chunk tagger. Again the com- 
bination outperformed the individual systems. 



Kudoh and Matsumoto (|2000D created 231 sup- 
port vector machine classifiers to predict the 
unique pairs of chunk tags. The results of the 
classifiers were combined by a dynamic pro- 
gramming algorithm. 

The performance of the systems can be found 
in Table ||. A baseline performance was ob- 
tained by selecting the chunk tag most fre- 
quently associated with a POS tag. All systems 
outperform the baseline. The majority of the 
systems reached an F^ = i score between 91.50 



and 92.50. Two approaches performed a lot 
better: the combination system WPDV used by 
Van Halteren and the Support Vector Machines 
used by Kudoh and Matsumoto. 

6 Related Work 



In the early nineties, Abney ( 1991 ) proposed 
to approach parsing by starting with finding 
related chunks of words. By then, Church 
(1988) had already reported on recognition 
of base noun phrases with statistical meth- 
ods. Ramshaw and Marcus ( [1995] ) approached 
chunking by using a machine learning method. 
Their work has inspired many others to study 
the application of learning methods to noun 
phrase chunking^. Other chunk types have not 
received the same attention as NP chunks. The 
most complete work is Buchholz et al. ( |1999| ), 
which presents results for NP, VP, PP, ADJP 



and ADVP chunks. Veenstra ( 1999 ) works with 
NP, VP and PP chunks. Both he and Buchholz 
et al. use data generated by the script that pro- 
duced the Co NLL-2 000 shared task data sets. 
Ratnaparkhi ( |1998| ) has recognized arbitrary 
chunks as part of a parsing task but did not re- 
port on the chunking performance. Part of the 
Sparkle project has concentrated on finding var- 
ious sorts of chunks for the different languages 



(Carroll et al., 1997). 



An elaborate overview of the work done on noun 
phrase chunking can be found on [http://lcg-www.i 
ac.be/~erikt/research/np-chunking.html 
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7 Concluding Remarks 

We have presented an introduction to the 
CoNLL-2000 shared task: dividing text into 
syntactically related non-overlapping groups of 
words, so-called text chunking. For this task we 
have generated training and test data from the 
Penn Treebank. This data has been processed 
by eleven systems. The best performing system 
was a combination of Support Vector Machines 
submitted by Taku Kudoh and Yuji Matsumoto. 
It obtained an Fg =1 score of 93.48 on this task. 
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